pixel distance
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
We propose the first unified theoretical analysis of mixed sample data augmentation (MSDA), such as Mixup and CutMix. Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters. Similarly, our theoretical results support that the MSDA training strategy can improve adversarial robustness and generalization compared to the vanilla training strategy. Using the theoretical results, we provide a high-level understanding of how different design choices of MSDA work differently. For example, we show that the most popular MSDA methods, Mixup and CutMix, behave differently, e.g., CutMix regularizes the input gradients by pixel distances, while Mixup regularizes the input gradients regardless of pixel distances. Our theoretical results also show that the optimal MSDA strategy depends on tasks, datasets, or model parameters. From these observations, we propose generalized MSDAs, a Hybrid version of Mixup and CutMix (HMix) and Gaussian Mixup (GMix), simple extensions of Mixup and CutMix. Our implementation can leverage the advantages of Mixup and CutMix, while our implementation is very efficient, and the computation cost is almost neglectable as Mixup and CutMix. Our empirical study shows that our HMix and GMix outperform the previous state-of-the-art MSDA methods in CIFAR-100 and ImageNet classification tasks.
Anticipation through Head Pose Estimation: a preliminary study
Tomenotti, Federico Figari, Noceti, Nicoletta
Abstract--The ability to anticipate others' goals and intentions More in detail, we hypothesize we can use the 3D Head I. Direction as a proxy of the gaze, and that by deriving simple A key element of natural human-human interaction is the visual geometrical cues in an unsupervised way - connecting ability to anticipate humans' goals and intentions [13]. The the head and hands of a subject with the elements in the same ability is paramount in different application domains environment - we can anticipate the goal of an action in terms - ranging from gaming to domotics and home assistance, of next active object or target position (when the movement to robotics. In the latter, in particular, anticipation abilities involves a change in location of objects). The goal is achieved may enable robots to seamlessly interact with humans in using object and human pose detectors, deriving the 3D head shared environments, enhancing safety, efficiency and fluidity pose and reasoning on the interaction between the human and in Human-Robot Interaction scenarios [8]. To test this hypothesis, Over the last years, the importance of leveraging non-verbal we conducted preliminary experiments using a private dataset cues for understanding humans' intentions has been well including videos of different subjects sitting in front of a table assessed [2, 3].
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.95)
- Information Technology > Sensing and Signal Processing > Image Processing (0.94)
- Information Technology > Artificial Intelligence > Vision > Video Understanding (0.41)
Monitoring Social Distancing Using People Detection (Part -II)
Continuing from my previous article where I explained the theoretical part of our object detection model, here I will explain how to actually implement our social distance monitoring tool. As already discussed, we have to first detect people and then use some heuristics on top of that to achieve our goal. For implementing the people detection, we will use Facebook's Detectron library which has all the trained weights for RetinaNet for people detection. After detecting all the people in a given frame, we will use simple pixel distances to calculate how far the person is from another person. After calculating the distance between two people we can put a threshold on that distance to decide if two people are near or far away from each other.
D-square-B: Deep Distribution Bound for Natural-looking Adversarial Attack
Xu, Qiuling, Tao, Guanhong, Zhang, Xiangyu
We propose a novel technique that can generate natural-looking adversarial examples by bounding the variations induced for internal activation values in some deep layer(s), through a distribution quantile bound and a polynomial barrier loss function. By bounding model internals instead of individual pixels, our attack admits perturbations closely coupled with the existing features of the original input, allowing the generated examples to be natural-looking while having diverse and often substantial pixel distances from the original input. Enforcing per-neuron distribution quantile bounds allows addressing the non-uniformity of internal activation values. Our evaluation on ImageNet and five different model architecture demonstrates that our attack is quite effective. Compared to the state-of-the-art pixel space attack, semantic attack, and feature space attack, our attack can achieve the same attack success/confidence level while having much more natural-looking adversarial perturbations. These perturbations piggy-back on existing local features and do not have any fixed pixel bounds.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Macao (0.04)
- Asia > China (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)